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Abstract 

When sampling the distribution P(4>) oc exp(—\A<j)\ 2 ), a global 
heatbath normally proceeds by solving the linear system A<p = if, 
where ff is a normal Gaussian vector, exactly. This paper shows how 
to preserve the distribution P((j>) while solving the linear system with 
arbitrarily low accuracy. Generalizations are presented. 



PACS numbers: 02.70.L, 02.50.N, 52.65.P, 11.15.H, 12.38. G 



In Monte Carlo simulations, it is frequently the case that one wants to 
sample a vector (ft from a distribution of the Gaussian type oc exp(—\A(ft\ 2 ). 
Typically, (ft has many components, and A is a large, sparse matrix. In lattice 
field theory, (ft is the value of the continuum field (ft at regular grid points, 
and A is the discretized version of some differential operator A. Illustrative 
examples used in this paper are A = m + ip (free field) and A = m + ip 1 
(Dirac operator). The goal of the Monte Carlo simulation is to provide 
independent configurations of (ft at the least cost. 

The brute-force approach consists of drawing successive random vec- 
tors ff k "> from the normal Gaussian distribution exp(— \vj\ 2 ), and of solving 
A(ft^ = ff k \ The solution of this linear system can be efficiently ob- 
tained with an iterative linear solver (Conjugate Gradient if A is Hermi- 
tian, BiCGStab otherwise). This approach can be called a global heatbath, 
because has no memory of (ft^: the heatbath has touched all the 

components of (ft. To avoid a bias, the solver must be iterated to full con- 
vergence, which is often prohibitively expensive. One may try to limit the 
accuracy while maintaining the bias below statistical errors, but this re- 
quires a delicate compromise difficult to tune a priori. A notable example 
of this global heatbath approach is the stochastic evaluation of inverse Dirac 
matrix elements, where several hundred "noise vectors" ff k ^ are inverted to 
yield (A^A)^ 1 m (<fai<pj)k- An abundant literature has been devoted to the 
optimization of this procedure §]. 

For the free field or the Dirac operator mentioned above, the number 
of iterations of the solver required to reach a given accuracy grows like the 
correlation length £ = l/m. Thus the work per new, independent (ft is c £ z 
where z, the dynamical critical exponent, is 1. However, the pref actor c 
is large. For this reason, local updates, where only one component of (ft is 
changed at a time, are often preferred. They usually provide an indepen- 
dent (ft after an amount of work c' £ 2 , but with a much smaller prefactor d 
P]. This paper presents an adaptation of the global heatbath which allows 
for arbitrarily low accuracy in the solution of A<ft = fj, thus reducing the 
prefactor c, while maintaining the correct distribution. This is obtained by 
the introduction of an accept /reject test of the Metropolis type, making the 
procedure a "quasi-heatbath." The method is described in the next section. 
Generalizations, including a local version, are presented afterwards. 
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1 Quasi-Heatbath 



Efficient Monte Carlo often relies on the subtle introduction of auxiliary 
degrees of freedom. Consider here a vector \ distributed according to 
exp(— \x — Acf>\ 2 ). Note that Z x is a constant (tt n for an TV-component 

complex vector) independent of eft. Therefore, the original distribution of 4>, 
-J- exp(— Ij4(/)| 2 ), is unchanged by the introduction of %: 

1 r v$e-\ A $ 2 = -V / vfVxe-^-tt-^". (1) 



Z<j> J z 4> z x J 

We can now alternate Monte Carlo steps on (p and Xi with the following 
prescription: 

1. Perform a global heatbath on x : 

X^A0 + fj, (2) 
where ff is a normal Gaussian vector; 



2. Reflect Acfi with respect to the minimum of the quadratic form (|^4(/>| 2 + 
\X-A$\ 2 ): 

A$^x-A$, 

i.e., 

ji—A^X-j. (3) 

Step 2 conserves the probability of 4> but is not ergodic. Step 1 provides the 
ergodicity. Note that step 2 exchanges the two terms \ A(p\ 2 and \x~ A(j)\ 2 in 
the quadratic form. Since x ~ A<fc in step 1 is set to a new random vector ff, 
Acj) at the end of step 2 is equal to ff. Therefore, a completely decorrelated 
4> has been generated. The vector x 18 n °t needed any longer and can be 
discarded. 

This two-step algorithm can now be modified slightly. The vector A~ l x 
in Eq.(^) need not be computed exactly. Consider an approximate solution 
£ with AQ = x — r, where f / is the residual. Step 2 should now be 
considered as a way to propose a candidate (j)' = f — <j> in a Metropolis 
procedure. Since Q is completely independent of (p or eft 1 , the probability of 
proposing (j)' given (f> is the same as that of proposing given (j)'. Detailed 
balance will be satisfied with the additional step: 
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3. Accept the candidate <f)' = £ — (ft with probability 

Pacc(0-^) =min(l,e- AS ), (4) 

where AS = \A(ff + A$'\ 2 - \A$\ 2 - \x - A$\ 2 . 
Simple algebra shows that 

AS = 2Re(rt • (A(j> - A<p)) , 

which is antisymmetric under the exchange (f> <f/, as it should be. If the 
linear system AC, = x is solved exactly (r = 0) , then AS" = and one recovers 
the original global heatbath with acceptance 1. Otherwise, the candidate 
(p' may be rejected, in which case (f>^' must be included once more in the 
Monte Carlo sequence: cj)^ k+1 ^ = <fr( k \ As the residual is allowed to grow, 
the average acceptance falls. But no bias is introduced: the distribution of 
(p remains exp(— |A0| 2 ). 

The optimal magnitude of r is thus the result of a compromise between 
accuracy and acceptance. The average acceptance of the prescription (||) is 
erfc(\/((AS') 2 )/8) ||. Here ((AS") 2 ) can be evaluated as a function of the 
convergence criterion e of the linear solver. If the solver yields a residual r 
such that < e, then {(AS) 2 ) < 8Ne 2 , where A$, A<fJ, and r have been 
considered independent random Gaussian vectors with variance N, N, and 
2e 2 N, respectively, and is the number of their components. Therefore, 
the acceptance is simply 

(acceptance) w erfc(eviV). (5) 

In other words, the acceptance is entirely determined by e and N, the number 
of degrees of freedom (the volume) of the system, and is independent of the 
matrix A. To maintain a constant acceptance as the volume grows, the 
convergence criterion for the solution of A( = \ should vary like 1 / yN. An 
accuracy e ~ 10~ 3 — 10~ 4 provides an acceptance of 80-90% up to systems 
of 10 6 degrees of freedom. There is no need for higher accuracy. 

The convergence of an iterative solver is typically exponential: e n ~ 
e -™/5 after n iterations. Therefore, the above prescription reduces the work 
by a factor of 2 to 3 compared to the usual approach which iterates the solver 
until "full" convergence (which typically means e ^ 10~ 8 — 10~ 12 ). Illustra- 
tive results are shown in Fig. 1 for the case of the Wilson-Dirac operator. 
This figure shows the number of iterations, the acceptance, and the work 
per independent 4> as a function of e. The acceptance obeys erfc(c ey/N), 
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where c < 1 (0.75 here) reflects the fact that the residual is always smaller (c 
times smaller on average) than required by the stopping criterion. For this 
system of ./V = 49 152 variables (8 4 lattice), the optimal stopping criterion 
is near 10 -3 . 




Figure 1: As the stopping criterion in the iterative solver is varied, the 
number of solver iterations (top) and the acceptance of the quasi-heatbath 
(middle) change. The acceptance is well described by Eq.(|5|) (solid line). The 
work per new <j> (bottom) shows a clear minimum. The optimal stopping 
criterion depends on the system size only (49 152 here). 
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2 Generalizations 

A. Over- and under-relaxation 

Consider a modification of Eq.(||) with a parameter A: 

i- / V$ e-l^l 2 = - J- / Pfe e -l^-l«-^l a . (6) 

The same 3-step algorithm of Section 1 now reads: 

1. Heatbath on x*: 

X * — A Acft + ff; (7) 

2. Reflection of eft about the approximate minimum of the quadratic form: 

2A 



Y "1 + A2 

where AQ = \ — r; 



C - h (8) 



3. Accept eft' with probability Pa, cc ((ft — * (ft 1 ) = min(l,e where A5 = 
\A<ft'\ 2 + \x~ \A<ft'\ 2 — \A<j)\ 2 — \x*— XA<ft\ 2 ; and, by simple algebra, 

AS = 2XRe{r^ ■ [A(ft- Acft')). (9) 

Thus, as A decreases from 1, ((AS) 2 } also decreases, which boosts the 
acceptance. On the other hand, Eq.@ indicates that eft' approaches — (ft as 
A — > 0, so that (ft 1 and (ft become very (anti)correlated. The parameter A 
allows interpolation between simple reflection (A = 0) and no motion at all 
(A = +oo). In fact, substituting Eq.(^) into Eq.(^) gives 

1 _ \2 9X 

4>' = -j^* + YTx 2{A ^~^ m 

Taking r = 0, one can identify this prescription with that of Adler's stochas- 
tic over-relaxation (AOR) 0]: 

$' = (1 - w)$ + ^Ju(2 - u)A- l rj, (11) 

provided u> = jr^s- The quasi-heatbath can be viewed as a flexible, global 
generalization of Adler's AOR. 
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It is clear from Eq.(||) that A < 1 allows for a looser convergence crite- 
rion e ~ 1/A. However, the work to reach convergence typically grows like 
—log e, whereas Eq.(|lO|) indicates that the number of Monte Carlo steps to 
decorrelate (j) will grow like j?. Therefore, it seems inadvisable to depart 
from A = 1. 

Nonetheless, there are many situations where a completely independent 
(j) at each Monte Carlo step is a wasteful luxury. When the matrix A fluc- 
tuates and depends on other variables U, it will take some time for the U 
to equilibrate in the new background cfi( k+1 ^ . Equilibration will be achieved 
quickly over short distances, more slowly over large ones. In that case it 
is useful to refresh the short-wavelength modes of 4> a t every MC step, but 
not the long- wavelength ones. The situation is similar for the stochastic 
evaluation of inverse Dirac matrix elements: one is interested in estimating 
(A^A)^ 1 , where the distance \i — j\ is short. Refreshing the long- wavelength 
modes every time is wasteful. 

B. Selective mode refresh 

The quasi-heatbath may be tailored for this purpose by modifying the basic 
Eq.® to 

— (V4> e~^ 2 = — [ V$Vx e-^l 2 -!*-^! 2 . (12) 
Z<j> J * 

The matrix C plays the role of the earlier XA, except that now A depends 
on the eigenmode considered. The three basic steps of the algorithm become: 

1. Heatbath on x : 

X^Cj+v; (13) 

2. Reflection of eft about the approximate minimum of the quadratic form: 

0' = C - I (14) 

where 

^A + C^C)C = C^x-r; (15) 

3. Accept (j)' with probability P acc {<P ~> 4 >/ ) = min(l, e~^ s ), where 

AS = 2Re(rt-(0 -(£')). 

For simplicity, consider the case where C and A commute. The candidate 
4>' can be expressed as 
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$' = -{A^A + C^Cy x {A^A - C^C)$+ 2(A^A + &C)-\Cflij- r). 

One wishes to obtain a heatbath (A ~ 1 in Section A) on short- wavelength 
modes. This implies a cancellation of eigenvalues in (A' A — C^C) for short 
wavelengths. For long wavelengths a heatbath is not necessary, and one 
could have A ~ or +00. One possible way to implement this would be 

C = F^AFA, 

where F is the Fourier transform and A is a diagonal matrix with entries 
\{k) growing from to 1 with momentum \k\. However, for operators A 
of the free-field or Dirac type, a simpler and equivalent way consists of 
modifying the mass parameter m to rnc > rn. This is equivalent to A(|fe|) = 

\j{vr? c + k 2 )/(m 2 + k 2 ). 

The mass which enters into the linear system to solve dig) is m c g = 
J (m 2 + mj^)/2. As mc is increased, so is m e ff. The work to approximately 
solve Eq.(|l5|) decreases as l/m e ff. Therefore, one achieves the desired effect 
of refreshing short-wavelength modes at cheaper cost. By drawing mc ran- 
domly from a suitable distribution at each MC step, the tailored refreshing 
of all Fourier modes with the desired frequency can be achieved. 



C. Local version 

The quasi-heatbath described so far is a global update procedure: all com- 
ponents of 4> are updated together. A local version readily suggests it- 
self: restricting the auxiliary vector \ to have only 1 non-zero component, 
Xi = X $i,i ( or an y subset of components). 
Eq.(||) then becomes 

— fv$ e-l^l 2 = — / V$Vx e -l^l 2 -lx-W)*ol 2 . (16) 

The algorithm is unchanged: 
1. Heatbath on x- X i — (^40)io + Vi 



2. Approximate reflection of 0: 4>' = £ — 4>, where AC, = x ~ 

3. Accept <j)' with probability P acc (4> — * 0') = min(l,e~ A5 ), 
where AS = Re(rt • (A$-A$)) + Re(r£ • {{A$) io - {A$') io )). 
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In this case, £ is the approximate Green's function of A for a source at Iq. 
It will have a support of size so that the local change in Xi wn l induce 

a change in (j> over a whole domain. By varying iq from 1 to N, one sweeps 
the whole system and generates a new vector <f>( k+1 h If the acceptance 
is maintained close to 1, will essentially be uncorrelated with <f>( k \ 

However, the work per local update is proportional to £ rf in d dimensions, 
so that this approach becomes very inefficient when the correlation length £ 
is large. Nevertheless, it may be advantageous for moderate £. The reason 
is that the approximate solution £ m A^x^io) need not be obtained by a 
Krylov method, which applies successive powers of A to the initial residual. 
Instead, one may search for the best solution £ among all vectors of localized 
support, for instance, iq and its nearest neighbours. 

D. Adler's stochastic over-relaxation 

Finally, the local variable x <K*> *o) may interact with <f> in the simplest way, 
with a contact interaction. This modifies Eq.((l|) to 

— [v$ e-l^l 2 = — L- / VUX e -l^l 2 -lx-A^o)| 2 . (17) 



Ztj, J z <t> z x 

If one chooses to update only (f>(io) and leave the other components of eft 
unchanged, then there is no need to invert the matrix A. The algorithm 
simplifies to: 

1. Heatbath on x' X < — ^(io) + T; 

2. Reflection of <f>(io) with respect to the minimum of the quadratic form 

(m 2 + \ 2 )\<p(i )\ 2 + (<p(i ) j ■ (-0 - Xx) + H.c.) + constant, 

where m 2 = (A^A) ioio and ip = (A^ A) ioj <j)(j). 

This reflection is exact, and so the acceptance test disappears. The new 
reflected value is 

1 — A 2 2 2A 

= -TT^^-TTa^^ + TTa^^ 

This prescription is identical to Adler's stochastic over-relaxation Q with 
the change of notation u> <-> 2/(1 + A 2 ). 
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3 Conclusion 



The quasi-heatbath Eqs.(||)-(|4]) is a simple and efficient method to globally 
change a vector distributed according to -^-e^ A ^ 2 . Like the global heat- 
bath consisting of solving A<j) = if, where ff is a Gaussian vector, exactly 
at each Monte Carlo step, the quasi-heatbath also has dynamical critical 
exponent 1. The prefactor is reduced by a factor of 2 to 3 because the linear 
system A(j) = fj can now be solved approximately. Whatever the level of 
accuracy, an acceptance test maintains the exact distribution e - '" 4 ^' 2 . The 
most efficient choice for the accuracy level is G(l/y/~N), where N is the 
volume of the system. 

Several generalizations of the quasi-heatbath have been proposed. A 
simple modification makes it possible to refresh each of the Fourier compo- 
nents of A<j> at a prescribed rate. A local version may be advantageous when 
the correlation length is moderate. In a limiting case, this version becomes 
identical to Adler's stochastic over-relaxation. 



I thank Massimo D'Elia for interesting discussions and valuable com- 
ments. 
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